課程資訊
課程名稱
統計機器學習
Statistical Machine Learning 
開課學期
107-1 
授課對象
理學院  應用數學科學研究所  
授課教師
洪英超 
課號
MATH5095 
課程識別碼
221 U8380 
班次
 
學分
3.0 
全/半年
半年 
必/選修
選修 
上課時間
星期二6,7,8(13:20~16:20) 
上課地點
天數304 
備註
星期二第8節上課教室:天數301。資料科學之統計基礎(二)替代課程。
限本系所學生(含輔系、雙修生) 或 限電資學院學生(含輔系、雙修生) 或 限理學院學生(含輔系、雙修生)
總人數上限:30人 
Ceiba 課程網頁
http://ceiba.ntu.edu.tw/1071MATH5095_SML 
課程簡介影片
 
核心能力關聯
本課程尚未建立核心能力關連
課程大綱
為確保您我的權利,請尊重智慧財產權及不得非法影印
課程概述

This course introduces methods to extract important patterns/information from data in fundamental science areas and presents basic concepts of machine learning from a statistical perspective. In particular, it emphasizes on the selection of appropriate methods and justification of choice, use of programming for implementation of the method, and evaluation and effective communication of results in data analysis reports. Topics covered include data preprocessing, visualization, statistical model selection/validation, matrix algebra, dimension reduction techniques, Bayesian decision theory, supervised and unsupervised learning problems, convex optimization, kernel methods, neural networks, information theory, etc. 

課程目標
 
課程要求
Intermediate level of Statistics (or equal level of Mathematical Statistics)

Fundamental Probability Theory

Linear Algebra

Basic programming in R 
預期每週課後學習時數
 
Office Hours
備註: Tuesday, 11:00-12:00 (天數531 or 532) or appointment by email: hungy@nccu.edu.tw 
指定閱讀
 
參考書目
 
評量方式
(僅供參考)
   
課程進度
週次
日期
單元主題
Week 1
2018/09/11  Grand Tour (Overview) 
Week 2
2018/09/18  Data Preprocessing, Visualization (Ggobi) and R 
Week 3
2018/9/25  Review of Matrix algebra and Some Related Optimization Problems (Slides Updated on 9/29) 
Week 4
2018/10/02  Supervised Learning: Review of Linear Regression, Model/Variable Selection (Best subset selection, forward and backward search, stepwise) 
Week 5
2018/10/09  Shrinkage methods and Regression in High Dimensions (ridge regression and lasso, Principal Component Regression (PCR) and Partial Least Squares (PLS) Regression). 
Week 6
2018/10/16  Supervised Learning: Loss, Risk, Bayesian inference and decision rule, parametric and non-parametric density estimation
 
Week 7
2018/10/23  Supervised Learning: Decision Trees, Linear Classifier, Quadratic Classifier, Logistic Classification, Nearest Neighbor Methods, Naïve Bayes.
The R codes are also available. 
Week 8
2018/10/30  Supervised Learning: Support Vector Machine, convex optimization, kernel methods.
 
Week 9
2018/11/6  Ensemble Learning: Bagging, Random Forest, Boosting, Stacking and Blending. 
Week 10
2018/11/13  Midterm Exam Week (no class) 
Week 11
2018/11/20  Unsupervised Learning: Introduction to Hierarchical Clustering, the Graph-Based Clustering. Lab: the olive oil data. 
Week 11
2018/11/21  Final Project Timeline and Notes (updated, must read) 
Week 12
2018/11/27  Unsupervised Learning: K-means/medoids and their variants, Model-Based approaches (Gaussian Mixtures and EM algorithm), Self-Organizing Maps (SOM). 
Week 13
2018/12/04  Unsupervised Learning: Multidimensional Scaling (MDS, Dimension reduction method), Local MDS, Spectral Clustering. 
Week 14
2018/12/11  Unsupervised Learning: Correspondence Analysis (Dimension reduction method for pure categorical data, simple CA, MCA, JCA, and adjusted MCA)  
Week 14
2018/12/11  Proposal of Final Project due 24:00 on 12/11. 
Week 15
2018/12/18  Unsupervised Learning: Canonical Correlation and Principal Component Analysis (CCA and PCA, Dimension reduction method) 
Week 16
2018/12/25  Unsupervised Learning: (Exploratory) Factor Analysis (FA, Dimension reduction method) and Independent Component Analysis (ICA based on Information Theory)
 
Week 17
2019/01/01  Holiday (no class) 
Week 18
2019/01/08  Final Presentation and Report Due